Probabilistic K-Means using Method of Moments

نویسنده

  • Sayantan Dasgupta
چکیده

K-means is one of the most widely used algorithms for clustering in Data Mining applications, which attempts to minimize the sum of square of Euclidean distance of the points in the clusters from the respective means of the clusters. The simplicity and scalability of K-means makes it very appealing. However, K-means suffers from local minima problem, and comes with no guarantee to converge to the optimal cost. K-means++ tries to address the problem by seeding the means using a distance based sampling scheme. However, seeding the means in K-means++ needs O(K) passes through the entire dataset, which can be very costly for large datasets. Here we propose a method to extract the means using second and third order moments of the data. Our method yields competitive performance with respect to all the existing K-means algorithms, while avoiding the expensive mean selection steps of K-means++ and other heuristics. We demonstrate the performance of our algorithm in comparison with the existing algorithms on various benchmark datasets.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Probabilistic analysis of stability of chain pillars in Tabas coal mine in Iran using Monte Carlo simulation

Performing a probabilistic study rather than a determinist one is a relatively easy way to quantify the uncertainty in an engineering design. Due to the complexity and poor accuracy of the statistical moment methods, the Monte Carlo simulation (MCS) method is wildly used in an engineering design. In this work, an MCS-based reliability analysis was carried out for the stability of the chain pill...

متن کامل

Persian Handwritten Digit Recognition Using Particle Swarm Probabilistic Neural Network

Handwritten digit recognition can be categorized as a classification problem. Probabilistic Neural Network (PNN) is one of the most effective and useful classifiers, which works based on Bayesian rule. In this paper, in order to recognize Persian (Farsi) handwritten digit recognition, a combination of intelligent clustering method and PNN has been utilized. Hoda database, which includes 80000 P...

متن کامل

A Novel Probabilistic Optimal Power Flow Method to Handle Large Fluctuations of Stochastic Variables

The traditional cumulant method (CM) for probabilistic optimal power flow (P-OPF) needs to perform linearization on the Karush–Kuhn–Tucker (KKT) first-order conditions, therefore requiring input variables (wind power or loads) varying within small ranges. To handle large fluctuations resulting from large-scale wind power and loads, a novel P-OPF method is proposed, where the correlations among ...

متن کامل

Load-Frequency Control: a GA based Bayesian Networks Multi-agent System

Bayesian Networks (BN) provides a robust probabilistic method of reasoning under uncertainty. They have been successfully applied in a variety of real-world tasks but they have received little attention in the area of load-frequency control (LFC). In practice, LFC systems use proportional-integral controllers. However since these controllers are designed using a linear model, the nonlinearities...

متن کامل

A fixed point approach to the Hyers-Ulam stability of an $AQ$ functional equation in probabilistic modular spaces

In this paper, we prove the Hyers-Ulam stability in$beta$-homogeneous probabilistic modular spaces via fixed point method for the functional equation[f(x+ky)+f(x-ky)=f(x+y)+f(x-y)+frac{2(k+1)}{k}f(ky)-2(k+1)f(y)]for fixed integers $k$ with $kneq 0,pm1.$

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • CoRR

دوره abs/1511.05933  شماره 

صفحات  -

تاریخ انتشار 2015